Improving Load Balance and Query Throughput of Distributed IR Systems
نویسنده
چکیده
As the number of queries grows over time it becomes necessary that Information Retrieval (IR) system provides high query processing rate i.e. high query throughput. In IR systems, there are three types of data partitioning, namely term-based, document-based, and hybrid partitioning. In document-based and hybrid partitioning, query is sent to all nodes and thus high level of parallelism is achieved but low query throughput. In term-based partitioning, a given query is divided into sub-queries and each sub-query is directed to the relevant node. This provides high query throughput and concurrency but poor parallelism and load balance. In this paper, the Moderate Distributed IR System (MDIRS) is proposed to improve the query throughput and load balance of hybrid partitioning. MDIRS inherits the advantage of documentbased partitioning i.e. it provides moderate level of parallelism and the advantage of term-based partitioning. In other words, it provides moderate level of query throughput and load balance. Results from this paper showed that the MDIRS improved the query throughput and the total query response time of hybrid partitioning by 64% over the baseline system.
منابع مشابه
Analyzing the Load Balance of Term-based Partitioning
In parallel (IR) systems, where a large-scale collection is indexed and searched, the query response time is limited by the time of the slowest node in the system. Thus distributing the load equally across the nodes is very important issue. Mainly there are two methods for collection indexing, namely document-based and term-based indexing. In term-based partitioning, the terms of the global ind...
متن کاملEM-KDE: A locality-aware job scheduling policy with distributed semantic caches
In modern query processing systems, the caching facilities are distributed and scale with the number of servers. To maximize the overall system throughput, the distributed system should balance the query loads among servers and also leverage cached results. In particular, leveraging distributed cached data is becoming more important as many systems are being built by connecting many small heter...
متن کاملMultiple query scheduling for distributed semantic caches
In distributed query processing systems, load balancing plays an important role in maximizing system throughput. When queries can leverage cached intermediate results, improving the cache hit ratio becomes as important as load balancing in query scheduling, especially when dealing with computationally expensive queries. The scheduling policies must be designed to take into consideration the dyn...
متن کاملCaching in Multi-Agent based Architecture for Distributed Information Retrieval Systems
Caching is an e ective technique for improving performance in Databases and Information Retrieval (IR) systems. Traditional IR systems access the collection indices to perform searches. Such searches on large corpora for queries oft repeated can be computationally redundant. In addition, querying remote sources can be expensive because of large communication overheads and frequent inavailabilit...
متن کاملMulti Objective Optimization Placement of DG Problem for Different Load Levels on Distribution Systems with Purpose Reduction Loss, Cost and Improving Voltage Profile Based on DAPSO Algorithm
Along with economic growth of countries which leads to their increased energy requirements,the problem of power quality and reliability of the networks have been more considered andin recent decades, we witnessed a noticeable growing trend of distributed generation sources(DG) in distribution networks. Occurrence of DG in distribution systems, in addition tochanging the utilization of these sys...
متن کامل